2018-09-01
Data scientist @ funda
Applied Statistics
thatssorandom.com
@edwin_thoen
CRAN: padr, GGally, recipes
Who:
Who:
Who:
Who:
Who:
We don't have to use R when using R!
mtcars$cyl_drat <- mtcars$cyl + mtcars$drat
We don't have to use R when using R!
mtcars$cyl_drat <- mtcars$cyl + mtcars$drat
Instead of this, we can do
library(dplyr) mtcars <- mtcars %>% mutate(cyl_drat = cyl + drat)
or
mtcars_dt <- data.table::as.data.table(mtcars) mtcars_dt[, cyl_drat := cyl + drat]
When you started using R, did you mix up?
install.packages("padr")
and
library(padr)
Apparantly, things that ought not to work, are working.
subset(mtcars, cyl == 6) ggplot2::ggplot(mtcars, aes(mpg, drat)) + geom_point() data.table::as.data.table(mtcars)[ ,mean(mpg), by = cyl]
By creating a variable we bind a value to a name.
my_val <- 123
Binding happens in an environment, in this case the global.
By creating a variable we bind a value to a name.
my_val <- 123
Binding happens in an environment, in this case the global.
Just call my name, I'll give you the value:
my_val
## [1] 123
This is evaluating the variable name.
R starts looking for the value of name in the environment the name is called in.
x <- "a variable in the global"
a_func <- function() {
x <- "a variable in the local"
x
}
a_func()
## [1] "a variable in the local"
When it can't find it locally, move up to the parent environment (where the current env was created).
z <- "a variable in the global"
another_func <- function() {
z
}
another_func()
## [1] "a variable in the global"
Error is thrown when the variable can't be found.
nobody_loves_me <- function() {
y
}
nobody_loves_me()
## Error in nobody_loves_me(): object 'y' not found
So this is standard evaluation in R.
Postpone judgement, store variable name in a name object.
quote(wait_for_it)
## wait_for_it
quote(wait_for_it) %>% class()
## [1] "name"
Postpone judgement, store variable name in a name object.
quote(wait_for_it)
## wait_for_it
quote(wait_for_it) %>% class()
## [1] "name"
This is the act of quoting, saving a variable name to be evaluated later.
(name is also called symbol)
Quoted variable names not evaluated. It doesn't matter if they don't exist.
quoted_var <- quote(wait_for_it) quoted_var
## wait_for_it
quoted_var %>% class()
## [1] "name"
Look for the value only when we ask to evaluate it.
eval(quoted_var)
## Error in eval(quoted_var): object 'wait_for_it' not found
wait_for_it <- "I finally have a value" eval(quoted_var)
## [1] "I finally have a value"
We can evaluate the name in a different environment.
pulldplyr::pull(mtcars, cyl)
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
pulldiy_pull <- function(x, name) {
eval(name, envir = x)
}
diy_pull(mtcars, quote(cyl))
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
pulldiy_pull <- function(x, name) {
eval(name, envir = x)
}
diy_pull(mtcars, quote(cyl))
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
A data frame is an environment too. Column names act as variables.
mtcars %>% select(cyl) as.data.table(mtcars)[, cyl] ggplot(mtcars, aes(cyl)) + geom_bar()
Why does R not throw an error? There is no cyl in the global…
Would this work?
diy_pull <- function(x, bare_name) {
name <- quote(bare_name)
eval(name, env = x)
}
Would this work?
diy_pull_2 <- function(x, bare_name) {
name <- quote(bare_name)
eval(name, env = x)
}
diy_pull(mtcars, cyl)
## Error in eval(name, envir = x): object 'cyl' not found
substitute quotes the argument's content
substitute_example <- function(x) {
substitute(x)
}
substitute_example(cyl)
## cyl
substitute_example(cyl) %>% class()
## [1] "name"
diy_pull <- function(x, bare_name) {
name <- substitute(bare_name)
eval(name, env = x)
}
diy_pull(mtcars, cyl)
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
koala <- function(x, y) {
x + 42
}
koala(3)
## [1] 45
def koala(x, y): return(x + 42) koala(3)
## TypeError: koala() takes exactly 2 arguments (1 given) ## ## Detailed traceback: ## File "<string>", line 1, in <module>
We can quote the following things:
name: the name of an R object
call: calling of a function
pairlist: something from the past you shouldn't bother about
literal: evaluates to the value itself
Just like a name, a function call can be delayed by quoting.
my_little_filter <- function(x,
call) {
call_quoted <- substitute(call)
retain_row <- eval(call_quoted, envir = x)
x[retain_row, ]
}
my_little_filter(mtcars, cyl == 4 & gear == 4) %>% head(2)
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_drat ## 3 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1 7.85 ## 8 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2 7.69
The value slot is empty at promise creation.
Only when the argument's expression is evaluated in the function, we start looking for it.
The value slot is empty at promise creation.
Only when the argument's expression is evaluated in the function, we start looking for it.
Remember koala?
koala <- function(x, y) {
x + 42
}
When we call koala we create the following promise
x_value <- 42 koala(x = x_value)
That's how subsitute works!
Accesses the expression in the promise without evaluating it.
subs_func <- function(val) {
vals_expr <- substitute(val)
deparse(vals_expr)
}
subs_func(anything_goes)
## [1] "anything_goes"
Note that deparse coerces the expression to a character. Its inverse is parse.
The tidyverse NSE dialect.
mtcars %>% select(cyl)
We now know that cyl gets somehow quoted by select and evaluated within the data frame.
But what if we want to wrap tidyverse code in a custom function?
This won't work
my_tv_func <- function(x, grouping_var) {
x %>%
group_by(grouping_var) %>%
summarise(max_drat = max(drat))
}
my_tv_func(mtcars, cyl)
Why?
In order to get it to work:
In order to get it to work:
my_tv_func <- function(x, grouping_var) {
x %>%
group_by(!!grouping_var) %>%
summarise(max_drat = max(drat))
}
my_tv_func(mtcars, quo(cyl))
Just like using substitute you can quote the arguments value with enquo.
my_grouping_func <- function(x, grouping_var) {
grouping_var_q <- enquo(grouping_var)
x %>%
group_by(!!grouping_var_q) %>%
summarise(max_drat = max(drat))
}
my_grouping_func(mtcars, cyl)
## # A tibble: 3 x 2 ## cyl max_drat ## <dbl> <dbl> ## 1 4 4.93 ## 2 6 3.92 ## 3 8 4.22
my_filter <- function(x, bare_call) {
call <- substitute(bare_call)
x[eval(call, envir = x), ]
}
my_filter(mtcars, cyl == 4) %>% head(1)
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_drat ## 3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 7.85
cyl == 4 on itself is invalid, there is no cyl in the global.substitute gets the expression, which is the quoted call.x.cyl column.quasiquotation
quosures
environments
@edwin_thoen
github.com/EdwinTh/satRday
edwinth.github.io/blog/nse
edwinth.github.io/blog/dplyr-recipes